gh-90533: Implement BytesIO.peek() by marcelm · Pull Request #30808 · python/cpython

marcelm · 2022-01-22T22:08:24Z

bedevere-bot · 2022-08-05T22:44:49Z

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

marcelm · 2022-11-09T13:16:15Z

@AlexWaygood You’ve been the only human to interact with this PR so far, do you possibly have any advice on how to move this forward?

AlexWaygood · 2022-11-09T13:36:44Z

Hi @marcelm — sorry for the delay in anybody looking at this. I haven't studied your PR in detail (or thought about whether the proposal is a good idea), but it looks well put together at first glance.

I'll try to take a look soon. I won't be able to review the C code, but I can comment on whether the proposal seems like a good idea, and I can review the Python implementation and the tests.

Note that if this proposal is accepted, it will also need:

Updates to the docs for BytesIO (Doc/library/io.rst)
An entry in "What's new in Python 3.12" (Doc/whatsnew/3.12.rst)

You can also add yourself to Misc/ACKS as part of this PR, and give yourself credit in the NEWS entry, if you like. (Neither is compulsory.)

marcelm · 2022-11-09T21:29:39Z

Thank you! I pushed a documentation update and will add an entry to the What’s new document in case the PR is reviewed favorably.

AlexWaygood · 2022-11-28T12:25:57Z

I'll defer to @benjaminp's judgement on this one -- I'm really not the right reviewer for this, unfortunately :(

Please ping me again if you still haven't had a review in a few weeks.

marcelm · 2022-11-28T12:43:35Z

Thank you! I appreciate that you took the time.

awalgarg · 2023-07-08T09:38:15Z

Please ping me again if you still haven't had a review in a few weeks.

@AlexWaygood Hi! Been 8 months, no review so far :) How can we take this forward? I'm not the original author of the PR but I also have a use-case for this API and would like to see this change land, happy to help progress this.

marcelm · 2023-07-08T13:14:48Z

Thanks for your interest! I have updated the PR to fix the merge conflict and to reflect that it now needs to target 3.13.

awalgarg · 2023-07-08T13:23:04Z

@marcelm That was quick, thank you so much!

AA-Turner · 2023-09-21T03:52:35Z

cc: @erlend-aasland @vstinner @benjaminp for the io C code.

A

vstinner · 2023-09-21T11:45:16Z

+      .. versionadded:: 3.13
+
+      Return bytes from the current position onwards but without advancing the
+      position. The number of bytes returned may be less or more than requested.


Wait. Why can it more than requested?

Apparently size=0 means "read all". Please document it.

This was just copied from the documentation for BufferedReader.peek(), but I agree that it could be written better. Changed now.

Wait. Why can it more than requested?

BufferedReader.peek() ignores the size argument and just returns whatever it has in its internal buffer.

I would prefer that BytesIO.peek() documents its behavior, rather than BufferedReader.peek() behavior.

Absolutely, it’s already changed.

(BTW, is there a policy on who clicks the "Resolve conversation" button?)

vstinner · 2023-09-21T11:47:32Z

+        pos = self.tell()
+        if size == 0:
+            size = -1
+        b = self.read(size)


Would it be possible to implement it without touching the position? This code is not thread safe. I don't know if it's supposed to be thread-safe. Maybe add a private read method which has an argument to decide to move the position or not.

Same remark for C code.

Good point; I implemented this now so that the position is not changed. I factored out a peek_bytes function in the C version that does not advance the position.

It seems though that neither the C nor the Python version of BytesIO are supposed to be thread safe. (They don’t use locks as e.g. BufferedReader does.) So I would suggest making them so would be a task for a different PR.

vstinner

I asked BytesIO.peek(0) to return an empty string, but I didn't notice that BufferedReader.peek(0) returns non-empty string. Now I'm unhappy that the two classes have an inconsistent behavior on peek(0), sorry.

>>> f=open("/etc/issue", "rb")
>>> f.peek(0)
b'\\S\nKernel \\r on \\m (\\l)\n\n'

On a short file, BufferedReader.peek(0) even returned the whole file contents!

Since BufferedReader.peek(0) behavior already exists, I would suggest to use the same behavior on BytesIO.peek(): return non-empty string.

cc @cmaloney

Co-authored-by: Victor Stinner <vstinner@python.org>

marcelm · 2026-04-13T12:55:05Z

Since BufferedReader.peek(0) behavior already exists, I would suggest to use the same behavior on BytesIO.peek(): return non-empty string.

Sure! I have changed it so that the rest of the buffer is returned for size=0.

The signature is still such that the default is size=1. Maybe that is actually a good compromise?

If desired, an explicit peek(0) returns a full (possibly expensive) copy - but the default is cheap.
peek() without arguments returns a single byte, which is the minimum necessary for compatibility with BufferedReader.peek(). Note that the latter also can return a single byte when that is how much is left in the buffer:

$ python3 -c 'f=open("README.rst", "rb");f.read(8191);print(f.peek())'
b'r'

If more than a single byte but not the full buffer is needed, an explicit size can be set.

Alternatively, the default could become size=0, but then the question is how huch of the buffer to return (which is where this PR got stalled last time).

marcelm · 2026-04-15T10:23:18Z

I have looked into the test failures on free-threaded Python but don’t know what is going on.

The _Py_CRITICAL_SECTION_ASSERT_OBJECT_LOCKED(self) assertion in peek_bytes_lock_held fails. That function was factored out from read_bytes_lock_held.

vstinner · 2026-04-15T13:28:30Z

I have looked into the test failures on free-threaded Python but don’t know what is going on.

You must decorate _io.BytesIO.peek with @critical_section:

diff --git a/Modules/_io/bytesio.c b/Modules/_io/bytesio.c
index f43cf1f7e41..98f31f3d1a8 100644
--- a/Modules/_io/bytesio.c
+++ b/Modules/_io/bytesio.c
@@ -511,6 +511,7 @@ _io_BytesIO_read1_impl(bytesio *self, Py_ssize_t size)
 
 
 /*[clinic input]
+@critical_section
 _io.BytesIO.peek
     size: Py_ssize_t = 1
     /

marcelm · 2026-04-15T20:28:44Z

You must decorate _io.BytesIO.peek with @critical_section:

Thanks, that worked!

marcelm · 2026-04-21T07:07:07Z

Is anything else needed? As far as I understand, this PR would need to be merged before beta 1 in two weeks in order to get this into 3.15.

cmaloney · 2026-04-21T07:32:56Z


      Return :class:`bytes` containing the entire contents of the buffer.

+   .. method:: peek(size=1, /)


Why not default to size=0 like BufferedReader?

Read through the comments. In general anything returning a bytes rather than a bytes-like memoryview is going to require allocating and copying potentially many bytes... If the copy is really concerning I'd lean returning a memoryview rather than a bytes which is a mandatory copy.

memoryview came up in this comment and the three following it: #30808 (comment)

I don’t know what the conclusion is given your comment. Should a memoryview be returned instead? Most important to me is compatibility with what BufferedReader.peek() returns.

I am not too concerned about the extra memory for a bytes object as long as the default is size=1.

👍 to not using memoryview / definitely a good rationale in that thread.

On 3.14.0 BufferedReader (intentionally limiting buffer size) gives:

Python 3.14.0 (v3.14.0:ebf955df7a8, Oct 7 2025, 08:20:14) [Clang 16.0.0 (clang-1600.0.26.6)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> br = open('README.rst', 'rb', buffering=30) >>> def show_start_end(buf): ... print(f'{len(buf)=}|{buf=}') ... >>> show_start_end(br.peek()) len(buf)=30|buf=b'This is Python version 3.15.0 ' >>> show_start_end(br.peek(0)) len(buf)=30|buf=b'This is Python version 3.15.0 ' >>> show_start_end(br.peek(1)) len(buf)=30|buf=b'This is Python version 3.15.0 ' >>> _ = br.read(len(br.peek())) >>> show_start_end(br.peek()) len(buf)=30|buf=b'alpha 7\n======================'

Which doesn't match the behavior here or defaults. That in particular concerns me because it seems like people will build and test I/O stack pieces (ex. file parsing) expecting the .peek behavior of BytesIO then get something different when they read actual files/data...

Looking for more alternatives, the documentation page says (https://docs.python.org/3/library/io.html#io.BufferedReader.peek):

Return bytes from the stream without advancing the position. The number of bytes returned may be less or more than requested. If the underlying raw stream is non-blocking and the operation would block, returns empty bytes.

To me that leaves an intentional gap where we can always return less data than the total amount available. Peek gives no guarantees. That makes me wonder: Could we default to 0 and just always slice [:DEFAULT_BUFFER_SIZE]?

That would mean you always get DEFAULT_BUFFER_SIZE which matches default BufferedReader behavior unless there is less data available. If called in a loop yes that's a DEFAULT_BUFFER_SIZE repeated copy (BufferedIO also does that / the bytes requires a copy), but it's a lot less than "the whole buffer".

I’m fine with changing the default size, but let me describe what my thought was.

My concern with choosing a default size larger than 1 was that people might test their parsing code using BytesIO that relies on the buffer always having a certain size, but then when they use BufferedReader.peek in non-testing code it could fail to work because BufferedReader.peek only guarantees that you get a single byte (if available). And if you’re almost at the end of the buffer, this does happen. Copying from an earlier comment:

$ python3 -c 'f=open("README.rst", "rb");f.read(8191);print(f.peek())' b'r'

My thought was that by defaulting to size=1, people would be forced to at least think about this edge case. It felt like the more "conservative" choice.

That said, maybe peek is mostly used at the start of a file, and in that case, even BufferedReader.peek can be relied on to return not just a single byte.

Definitely sympathize with wanting to vary size over time... not sure of a simple way to do that in a single change though. For BufferedReader there end up being two cases here:

For non-blocking fd: peek() returns as much data as buffered currently (up to DEFAULT_BUFFER_SIZE), if there is no buffered data will attempt a read() to get some. If that returns no data / would block, will return empty buffer

For blocking : Similar to non-blocking with the caveat that no data available on many kinds of files on many systems, but not all, means "end of file".

For both those cases a length 1 peek return can happen as part of regular operation, as can shorter/longer.

I'm most used to peek being used in parsing algorithms where looking ahead is often used as a way to disambiguate. For things where need a fixed readahead though usually have to build some other mechanism so that the read buffer is "guaranteed filled" to a minimum size which CPYthon .peek() on Buffered at the moment doesn't provide.

One of my thoughts was potentially a feature extension on top of this work on BytesIO: Being able to specify a "buffer size" window which is read through which gives the "peek return length varies"... Ideally that in time could pair with changing BufferedReader .peek to try and guarantee a specific length of data available which would make lexers and parsers simpler to write.

I'm not sure the risk in changing defaults of arguments in CPython / would need to ask core developers with relevant experience there. In general CPython is pretty hesitant to make incompatible changes (https://peps.python.org/pep-0387/#making-incompatible-changes) and not sure where exactly changing a default arg from 1 to 0 would fit in that framework. That has me hesitant to start with a distinct value from BufferedReader for because it may be difficult to change later.

I do think giving your thought of "definitely not whole file" answer to BytesIO.peek is good design here. The best approach I have so far is to always just limit to a fixed length (which matches BufferedReader in some ways: You get the whole file if it fits). Open to other approaches as well; I have a preference to keep the design internal to BytesIO implementation rather than function signatures/default arguments as that is easier to change.

I reached out to some core developers to learn about the risks both from a "duck typing" and backwards incompatibility pieces. Will hopefully have some clear guidance shortly / find out if size=1 is actually risky from people with a lot more experience.

cmaloney · 2026-04-21T07:36:45Z

Also: Please extend the free threading test in Lib/test/test_free_threading/test_io.py to cover the new method

cmaloney · 2026-04-21T07:39:55Z

Overall looking good, definitely think this is in good shape. Just some weird corners (I've spent too much time reading I/O code...)

marcelm

Thanks for the comments! I tried to address them.

Also: Please extend the free threading test in Lib/test/test_free_threading/test_io.py to cover the new method

I have added something there, but wasn’t quite sure what I was doing; can you please have a look?

marcelm · 2026-04-21T09:18:38Z


      Return :class:`bytes` containing the entire contents of the buffer.

+   .. method:: peek(size=1, /)


memoryview came up in this comment and the three following it: #30808 (comment)

I don’t know what the conclusion is given your comment. Should a memoryview be returned instead? Most important to me is compatibility with what BufferedReader.peek() returns.

I am not too concerned about the extra memory for a bytes object as long as the default is size=1.

cmaloney · 2026-04-23T02:03:25Z


      Return :class:`bytes` containing the entire contents of the buffer.

+   .. method:: peek(size=1, /)


👍 to not using memoryview / definitely a good rationale in that thread.

On 3.14.0 BufferedReader (intentionally limiting buffer size) gives:

Python 3.14.0 (v3.14.0:ebf955df7a8, Oct 7 2025, 08:20:14) [Clang 16.0.0 (clang-1600.0.26.6)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> br = open('README.rst', 'rb', buffering=30) >>> def show_start_end(buf): ... print(f'{len(buf)=}|{buf=}') ... >>> show_start_end(br.peek()) len(buf)=30|buf=b'This is Python version 3.15.0 ' >>> show_start_end(br.peek(0)) len(buf)=30|buf=b'This is Python version 3.15.0 ' >>> show_start_end(br.peek(1)) len(buf)=30|buf=b'This is Python version 3.15.0 ' >>> _ = br.read(len(br.peek())) >>> show_start_end(br.peek()) len(buf)=30|buf=b'alpha 7\n======================'

Which doesn't match the behavior here or defaults. That in particular concerns me because it seems like people will build and test I/O stack pieces (ex. file parsing) expecting the .peek behavior of BytesIO then get something different when they read actual files/data...

Looking for more alternatives, the documentation page says (https://docs.python.org/3/library/io.html#io.BufferedReader.peek):

Return bytes from the stream without advancing the position. The number of bytes returned may be less or more than requested. If the underlying raw stream is non-blocking and the operation would block, returns empty bytes.

To me that leaves an intentional gap where we can always return less data than the total amount available. Peek gives no guarantees. That makes me wonder: Could we default to 0 and just always slice [:DEFAULT_BUFFER_SIZE]?

That would mean you always get DEFAULT_BUFFER_SIZE which matches default BufferedReader behavior unless there is less data available. If called in a loop yes that's a DEFAULT_BUFFER_SIZE repeated copy (BufferedIO also does that / the bytes requires a copy), but it's a lot less than "the whole buffer".

the-knights-who-say-ni added the CLA not signed label Jan 22, 2022

This comment was marked as outdated.

Sign in to view

bedevere-bot added the awaiting review label Jan 22, 2022

the-knights-who-say-ni added CLA signed and removed CLA not signed labels Feb 1, 2022

marcelm force-pushed the fix-issue-46375 branch from 3111318 to 675718d Compare April 11, 2022 20:06

marcelm changed the title ~~bpo-46375: Implement BytesIO.peek()~~ gh-90533: Implement BytesIO.peek() Apr 11, 2022

This comment was marked as outdated.

Sign in to view

AlexWaygood added the topic-IO label Apr 14, 2022

ezio-melotti removed the CLA signed label Jul 13, 2022

marcelm force-pushed the fix-issue-46375 branch from 675718d to d4e849f Compare August 5, 2022 22:44

marcelm force-pushed the fix-issue-46375 branch from ef5eb69 to b82fa85 Compare August 5, 2022 23:28

marcelm force-pushed the fix-issue-46375 branch 2 times, most recently from 6998be1 to be39ff2 Compare November 6, 2022 15:11

AlexWaygood added the type-feature A feature request or enhancement label Nov 9, 2022

AlexWaygood self-requested a review November 9, 2022 13:37

kumaraditya303 requested a review from benjaminp November 27, 2022 07:18

AlexWaygood removed their request for review November 28, 2022 12:26

marcelm force-pushed the fix-issue-46375 branch from 5cb3143 to 79d5032 Compare September 19, 2023 08:04

vstinner reviewed Sep 21, 2023

View reviewed changes

marcelm added 10 commits April 10, 2026 23:22

Do not update pos if peek_bytes failed

04372bd

Size can be negative after truncate or seek

6b9ae8c

Test with size<0 and size>len(buf)

f7406f6

Test peek() after write()

d9528e2

Document BufferedReader.peek and BytesIO.peek similarly

bc8134b

Comment

b6ffca8

Make it more explicit that size is ignored

5fe5645

Return an empty bytes object for size=0

4126a64

Simplify

1ea40c2

Test peek(3) and peek(5)

77e04d6

marcelm force-pushed the fix-issue-46375 branch from 611bdaa to 77e04d6 Compare April 10, 2026 21:37

marcelm requested a review from AA-Turner as a code owner April 10, 2026 21:37

Run clinic.py

4d2f2dd

vstinner reviewed Apr 11, 2026

View reviewed changes

Comment thread Modules/_io/bytesio.c

Comment thread Modules/_io/bytesio.c Outdated

Comment thread Lib/test/test_io/test_memoryio.py Outdated

marcelm and others added 2 commits April 13, 2026 13:46

Apply suggestions from code review

c16bebf

Co-authored-by: Victor Stinner <vstinner@python.org>

Do not return an empty bytes object for size=0

08bd7da

Decorate BytesIO.peek with @critical_section

6174fca

cmaloney reviewed Apr 21, 2026

View reviewed changes

marcelm added 2 commits April 21, 2026 11:09

Test peek returns EOF after seeking to EOF

b8b8cf4

Free-threading test for peek

7ac914e

marcelm commented Apr 21, 2026

View reviewed changes

cmaloney reviewed Apr 23, 2026

View reviewed changes

Fix free-threading test

abbd8f0


		Return :class:`bytes` containing the entire contents of the buffer.

		.. method:: peek(size=1, /)

Uh oh!

Conversation

marcelm commented Jan 22, 2022 • edited by AlexWaygood Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

bedevere-bot commented Aug 5, 2022

Uh oh!

marcelm commented Nov 9, 2022

Uh oh!

AlexWaygood commented Nov 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marcelm commented Nov 9, 2022

Uh oh!

AlexWaygood commented Nov 28, 2022

Uh oh!

marcelm commented Nov 28, 2022

Uh oh!

awalgarg commented Jul 8, 2023

Uh oh!

marcelm commented Jul 8, 2023

Uh oh!

awalgarg commented Jul 8, 2023

Uh oh!

AA-Turner commented Sep 21, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vstinner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

marcelm commented Apr 13, 2026

Uh oh!

marcelm commented Apr 15, 2026

Uh oh!

vstinner commented Apr 15, 2026

Uh oh!

marcelm commented Apr 15, 2026

Uh oh!

marcelm commented Apr 21, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cmaloney commented Apr 21, 2026

Uh oh!

cmaloney commented Apr 21, 2026

Uh oh!

marcelm left a comment

Choose a reason for hiding this comment

marcelm commented Jan 22, 2022 •

edited by AlexWaygood

Loading

AlexWaygood commented Nov 9, 2022 •

edited

Loading